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1 Introduction 


Ofqual has contracted the National Foundation for Educational Research (NFER) to develop, 
administer and analyse the National Reference Test in English and maths. The first National 
Reference Test took place in 2017 and established a baseline from which any future changes in 
standards can be detected. This report represents an overview of the findings of the 2019 testing 
process. 


The National Reference Test (NRT), which consists of a series of test booklets, provides evidence 
on changes in the performance standards of the same content that is tested in GCSE English 
language and maths in England at the end of key stage 4. It has been designed to provide 
additional information to support the awarding of GCSEs in English language and maths and is 
based on a robust and representative sample of Year 11 students who will, in the relevant year, 
take their GCSEs. 


More information about the NRT can be found in the NRT document collection 
https://Awww.gov.uk/government/collections/national-reference-test-information 


The first live NRT took place in late February and early March 2017. The outcomes of the 2017 
GCSE examinations for that year provided the baseline percentages at three grade boundaries 
and these were mapped to the NRT for 2017 to establish the corresponding proficiency level. The 
percentages of students achieving those proficiency levels in each subsequent year are calculated 
and compared. 


The National Reference Test structure is intended to remain the same each year. For each of 
English and maths there are eight test booklets in use. All questions are used in two booklets, so 
that effectively all the tests can be analysed together to give a single measure of subject 
performance. This is similar to other studies that analyse trends in performance nationally, for 
example, international surveys such as PISA and TIMSS. 


This report provides summarised information of the key performance outcomes for English and 
maths and provides information on the changes from the baseline standards established in 2017. It 
also includes data on the achievement of the samples, their representativeness and the 
performance of the students on the tests. Further information on the nature of the tests, the 
development process, the survey design and its conduct, and the analysis methods used is 
provided in the accompanying document: Background Report: National Reference Test 
Information. 
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2 The sample 


The NRT took place between 25 February and 8 March 2019. The numbers of participating 
schools and students are shown in Table 2.1. These were an increase on 2018, when the testing 
period had been disrupted by adverse weather. 


Table 2.1 Target sample sizes and achieved samples in current and previous year 


Achieved Sample 
Target Current Previous Year Previous Year 
Sample Year 2018 2017 
2019 

English 
Number of 330 332 312 339 
Schools 
Number of 7920 6740 6193 7082 
Students 
Mathematics 
Number of 330 331 307 340 
Schools 
Number of 7920 6826 6169 7144 
Students 


The sample was stratified by the previous attainment of schools in GCSE English language and 
maths and also by school size. In addition, the types of school were monitored. Checks were made 
on all three of these variables to ensure that the achieved sample was close to that drawn in the 
sampling frame. This was generally the case, but there was an under-representation of independent 
schools in the achieved sample, likely because their participation is voluntary. This may have 
resulted in the final sample of schools being slightly lower attaining than the national population. 
This was also true in 2018. Given that the sample for the NRT will be drawn on the same basis 
every year, this arrangement will remain constant each year so it will not impact on the usability of 
the results. 


Table 2.2 shows the number of students in the final sample for whom booklets were dispatched and 
the number completing the tests for both English and maths. As this shows, around 85 per cent of 
students who were selected took part in the tests. This was a high participation rate and consistent 
with the rates achieved in 2017 and 2018. 
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Table 2.2 Completed student test returns for English and maths 2019 


Test type No. of No. of students: | % of students: 
students: completed tests | completed tests 
dispatched 
tests 
English 7928 6739 85 
Maths 7917 6825 86 


In total 1,189 students from 303 schools were recorded as non-attendees during the English NRT, 
which is 15 per cent of the total number of 7,928 sampled students spread across 90 per cent of 
the schools participating in the survey. Similarly, 1,092 students from 296 schools were recorded 
as non-attendees during the maths NRT, which is 14 per cent of the total number of 7,917 sampled 
students spread across 84 per cent of the schools participating in the test. 


The principal reason given for non-attendance was absence due to illness or other authorised 
reason, which covered about 56 per cent of non-attendance. Sixteen per cent were in school but 
did not attend the testing; about eight per cent were withdrawn by the headteacher and another six 
per cent had left the school. 


The percentage of non-attendance in 2019 was very similar to that in 2018. A high student 
participation rate is needed to ensure precision of the estimates of the results. However, an 85 per 
cent attendance rate is considered high when compared to other monitoring tests such as 
international large-scale assessments and there was no evidence that the pattern of non- 
attendance was skewed to particular school types, for example, those with lower performance in 
previous English language and maths GCSEs. 


The NRT offers access arrangements consistent with JCQ requirements (for GCSE examinations) in 
order to make the test accessible to as many sampled students as possible. Schools were asked to 
contact NFER in advance of the NRT to indicate whether any of their students required modified test 
materials or if students’ normal working practice was to use a word processor or laptop during 
examinations. In cases where additional time would be needed for particular students, schools were 
asked to discuss this need with the NFER test administrator and ensure that the extra time for the 
testing session could be accommodated. All requests from schools for access arrangements and 
the type of arrangement required were recorded. Table 2.3 below shows the different types of 
access arrangements that were provided to students for the NRT in 2019. These were very similar 
to 2018. 
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Table 2.3. Number of access arrangements provided 2019 


Arrangement provided 


No. of students 


English Maths Total 
Word processor 290 104 394 
Different colour test paper 97 107 204 
Modified enlarged print 19 20 39 
Enlarged copies 5 6 11 
Braille 1 0 1 
Total 412 237 649 
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3 Results for the test booklets in 2019 


Details of the analysis procedures are given in the accompanying document: Background Report: 
National Reference Test Information. The analysis process followed a sequence of steps. 
Initially the tests were analysed using Classical Test Theory to establish that they had performed 
well, with appropriate difficulty and good levels of reliability. The subsequent analyses used Item 
Response Theory techniques to link all the tests together and estimate the ability of all the students 
on acommon scale for each subject, independent of the test or items they had taken. These ability 
estimates were then used for calculating the ability level at the percentiles associated with the 
GCSE grade boundaries in 2017. From 2018 onwards, the percentages of students achieving 
above these baseline ability levels are established from the survey. 


English 


The results of the Classical Test Theory analyses are summarised in Table 3.1. This shows the 
main test performance statistics averaged for the eight English test booklets used. 


Table 3.1: Summarised Classical Test Theory Statistics for the English Tests in 2019 


Average Number of Students Taking Each Test Booklet 842 


Average Score Attained 19.7 


Average Reliability of the Tests (Coefficient Alpha) 0.78 


These results show that the English tests booklets functioned well. The maximum scores attained 
were near the total (total marks available for the component/booklet) although few students 
attained scores over 40. The average scores were somewhat less than half marks. The standard 
deviation shows that the scores were well spread out, allowing discrimination between the 
students. This is confirmed by the reliability coefficients which are at a good level for an English 
test of this length. Finally, the average percentage of items attempted by the students at over 90 
per cent indicates that the students were engaging with the test and attempting to answer the 
majority of questions. 


These results were confirmed by the distribution of scores which students achieved on the tests. 
This is shown for one of the tests in Figure 3.1. The distributions were similar for the other tests. 
The figure shows that scores were attained over nearly all of the possible marks and that the 
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students were fairly evenly spread over the range. It is an example of one test booklet only. There 
was a good spread of scores across the available marks, although no students attained the very 
highest marks. 


Figure 3.1: Score Distribution for one of the English Tests 


Observed score distribution E19A02 
50- 


mean = 20.071 


Frequency 


0 10 20 30 40 
Observed score 


In addition, a full item analysis was carried out for each test, in which the difficulty of every question 
and its discrimination were calculated. These indicated that all the questions had functioned either 
well or, in a small number of cases, adequately and there was no need to remove any items from 
the analyses. Therefore all were retained for the Item Response Theory (IRT) analyses. 
Additionally, an analysis was conducted to establish if any items had performed markedly 
differently in 2019 compared with the two previous years. Where there are such indications, a 
formal procedure is followed for reviewing such items to establish whether there could be an 
external reason for the change. For 2019, one English item was removed from the link between 
2017/18 and 2019. 


Using the common items, the IRT analyses equated the eight tests. The IRT analyses also used 
the items common between years’ to equate the tests over years, allowing ability estimates for 
students in all three years to be on the same scale. After this had been done, the results showed 
that the mean ability scores for students were very similar for all the tests, confirming that the 


1 The 2019 version of the NRT contained the same items as those used in the 2017 and 2018 tests. 
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random allocation to tests had been successful. The results also showed that the level of difficulty 
of the eight tests was fairly consistent, with only small differences between them. 


Both the Classical Test Theory results and the Item Response Theory results for the English tests 
showed that these had functioned well to provide good measures of the ability of students, 
sufficient for estimating averages for the sample as a whole. 


Maths 


The results of the Classical Test Theory analyses are summarised in Table 3.2. This shows the 
main test performance statistics averaged for the eight maths tests used. 


Table 3.2: Summarised Classical Test Theory Statistics for the Maths Tests in 2019 


Average Number of Students Taking Each Test Booklet 853 


Average Score Attained 22.7 
Average Reliability of the Tests (Coefficient Alpha) 0.90 


These results show that the maths tests also functioned well. The maximum scores attained were 
just short of the total score possible, and for five tests a small number of students did attain full 
marks. (This is more likely in maths than English.) The average scores were again slightly less 
than half marks. The standard deviation shows that the scores were well spread out, allowing 
discrimination between the students. This is confirmed by the reliability coefficients which are at a 
good level for a maths test of this length and higher than for English, which again is usual. Finally, 
the average percentage of items attempted by the students at 88 per cent indicates that the 
students were engaging with the test and attempting to answer the majority of questions, although 
to a lesser extent than for the English test. However, there are more items for students to attempt 
in the maths test. 


These results were confirmed by the distribution of scores which students achieved on the tests. 
This is shown for one of the tests in Figure 3.2. The distributions were similar for the other tests. 
The figure shows that scores were attained over all of the possible marks and that the students 
were fairly evenly spread over the range. 
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Figure 3.2: Score Distribution for one of the Maths Tests 


Observed score distribution M19A06 


mean = 21.38 
sd = 11.795 
n= 857 


30 - 


Frequency 


0 10 20 30 40 50 
Observed score 


In addition, a full item analysis was carried out for each test, in which the difficulty of every question 
and its discrimination were calculated. These indicated that all the questions had functioned either 
well or, in a small number of cases, adequately. There was no need to remove any items from the 
analyses. All were retained for the Item Response Theory (IRT) analyses. Additionally, an analysis 
was conducted to establish if any items had performed markedly differently in 2019 compared with 
the two previous years. Where there are such indications, a formal procedure is followed for 
reviewing such items to establish whether there could be an external reason for the change and if 
there is sufficient evidence to remove the item from the link between years. In 2019, no items had 
to be removed. 


Using the common items, the IRT analyses equated the eight tests. The IRT analyses also used 
the items common between years? to equate the tests over years, allowing ability estimates for 
students in the two years to be on the same scale. After this had been done, the results showed 
that the mean ability scores for students were very similar for all the tests, confirming that the 
random allocation to tests had been successful. The results also showed that the level of difficulty 
of the eight tests was fairly consistent, with only small differences between them. 


2 The 2019 version of the NRT contained the same items as those used in the 2017 and 2018 tests. 
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Both the Classical Test Theory results and the Item Response Theory results for the maths tests 
showed that these had functioned well to provide good measures of the ability of students, 
sufficient for estimating averages for the sample as a whole. 


Summary 


These initial stages of the analyses, the Classical Test Theory evaluation of test functioning and 
the Item Response Theory equating of the tests, indicate that the NRT performed well. This 
allowed the final stages of the analysis, the estimation of the percentages of students above the 
same ability thresholds as in 2017 and the calculation of their precision to be undertaken with 
confidence. These are described in Sections 4 and 5 for English and maths respectively. 
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4 Performance in English in 2019 


The objective of the National Reference Test is to get precise estimates of the percentages of 
students each year achieving at a level equivalent to three key GCSE grades in 2017: these key 
grades are 4, 5 and 7. For the NRT in 2017, these baseline percentages were established from the 
2017 GCSE population percentages. The NRT ability distribution, based on the IRT analysis, was 
then used to establish the ability thresholds which corresponded to those percentages. From 2018 
onwards, they will correspond to the same student ability as those of 2017, thus allowing the 
tracking of performance. Alongside this, based on the sample achieved and the reliability of the 
tests, we are able to model the level of precision with which the proportion of students achieving 
the ability thresholds can be measured. The target for the NRT is to achieve a 95% confidence 
interval of plus or minus not more than 1.5 percentage points from the estimate at each ability 
threshold. 


Ofqual provided the percentages of students at or above the three relevant grades (grades 4, 5 
and 7) taken from the 2017 GCSE population. These are shown in Table 4.1. These percentages 
were mapped to three ability threshold scores in the NRT in 2017. 


Table 4.1. English 2017 NRT Baseline Thresholds 


Threshold Percentage of students above 
threshold from 2017 GCSE 
Grade 7 and above 16.8 
Grade 5 and above 53.3 
Grade 4 and above 69.9 


In 2019, the NRT data for the years 2017 to 2019 were analysed together using Item Response 
Theory (IRT) modelling techniques. By analysing all the data concurrently, ability distributions 
could be produced for the 2017, 2018 and 2019 samples on the same scale. The percentages of 
students at each of the three GCSE grade boundaries, fixed on the 2017 distribution, could then be 
mapped onto the 2018 and 2019 distributions to produce estimates of the percentage of students 
at the same level of ability in those years. For example, the percentage of pupils at the ‘Grade 4 
and above’ threshold in the 2017 GCSE population was 69.9 per cent. This was mapped onto the 
2017 distribution to read off an ability value at that grade boundary. The same ability value on the 
2018 and 2019 distributions can then be found, and the percentage of students at this threshold or 
above in those years established. In this way, we are able to estimate the percentage of students 
at the same level of ability as represented in the 2017 GCSE population, for each year of the NRT 
going forward. The precision of these estimates is dependent on both the sample achieved and the 
reliability of the tests as measures. 
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Table 4.2 presents the percentages of students achieving above the specified grade boundaries for 
the years 2017 to 2019. Confidence intervals for percentages are provided in brackets alongside 
the estimates. This is important as it shows that although there have been declines in performance, 
these are often within the confidence intervals. The statistical interpretation of the differences is 
discussed below. 


Table 4.2 Estimated percentages at grade boundaries in English 


Threshold Grade 4 and above Grade 5 and above Grade 7 and above 
2017 69.9 (67.9-71.8) 53.3 (51.5-55.2) 16.8 (15.5-18.1) 
2018 68.8 (66.8-70.7) 52.8 (50.6-54.9) 16.8 (15.3-18.2) 
2019 65.8 (64.1-67.5) 49.8 (47.8-51.9) 16.0 (14.7-17.4) 


Note that, because of the way in which they have been computed, the 2017 NRT percentages 
match closely with the GCSE percentages. The confidence intervals for them reflect the fact that 
the NRT 2017 outcomes carry the statistical error inherent in a sample survey, as per the 
subsequent years. 


Since the 2017 and 2018 percentages have been re-estimated following the concurrent calibration 
with the 2019 data, these figures differ slightly from those reported in previous years. Some degree 
of variation is expected given the addition of more data, and the differences seen are well within 
the confidence intervals for the estimates. 


Table 4.3 shows the half widths of the confidence intervals. The table illustrates that the precision 
for 2019 is slightly improved on previous years. 


Table 4.3. English NRT half width of confidence intervals each year 


Threshold Grade 4 and above Grade 5 and above Grade 7 and above 
2017 1.9 1.9 1.3 
2018 2.0 2.1 1.4 
2019 1.7 2.0 1.4 


Figure 4.1 presents 95% confidence intervals around the percentages achieving at least the 
specified grade boundary in 2019, as compared to 2018 and the 2017 population baseline 
percentages. The 2017 population percentages are represented as dotted lines and the trend lines 
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across years as solid lines. This format has been used to encourage the reader to compare the 
2019 and 2018 point estimate confidence bands with the 2017 baseline population percentages, 
bearing in mind the confidence intervals. 


Figure 4.1 Chart showing performance each year as compared to the 2017 
population percentiles (with trend lines) 


2017 2018 2019 
year 


~*-grade 4 and above -*- grade 5 and above -*-grade 7 and above 


A key question arising for the NRT results in a given year is to determine if there is a statistically 
significant difference from the results in the previous year. For the NRT, several comparisons are 
being made and this gives rise to a danger that changes that arise by chance may seem real. 
Hence the criteria for significance which have been used are adjusted for multiple comparisons. 
(For more information see Appendix A.) 


Using this method for multiple comparisons, the NRT English data shows that the percentage of 
students at grade 4 and above in 2019 is significantly lower than in 2017, the baseline year, at the 
5% level of significance, but not at the 1% level of significance. The differences for grade 5 and 
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above and grade 7 and above from 2017 to 2019 were not significantly different. Nor were there 
any significant differences from 2017 to 2018 or from 2018 to 2019 for any of the three grades.° 


3 The results of a given year’s NRT can be compared with the NRT study of 2017 (both are sample surveys, 
and the statistical error is therefore reflected in confidence intervals for 2017) or with the GCSE percentages 
of 2017, regarded as external constants. The 2078 Results Digest reported comparisons with the GCSE 
2017 population percentages. However, in order to make ongoing comparisons from year to year it was 
decided for 2019 onwards that comparing the outcomes of all NRT studies (including the 2017 NRT study, 
rather than 2017 GCSE percentages) would be more informative, and these differences are the focus of the 
2019 report. 
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5 Performance in maths in 2019 


The objective of the National Reference Test is to get precise estimates of the percentages of 
students each year achieving at a level equivalent to three key GCSE grades in 2017: these key 
grades are 4, 5 and 7. For the NRT in 2017, these baseline percentages were established from the 
2017 GCSE population percentages. The NRT ability distribution, based on the IRT analysis, was 
then used to establish the ability scores which corresponded to those percentages. From 2018 
onwards, they will correspond to the same student ability as those of 2017, thus allowing the 
tracking of performance. Alongside this, based on the sample achieved and the reliability of the 
tests, we are able to model the level of precision with which the proportion of students achieving 
the ability scores can be measured. In this context, the precision is half of the 95% confidence 
intervals for the measurement of these percentages. The target for the NRT is to achieve a 95% 
confidence interval of plus or minus not more than 1.5 percentage points from the estimate at each 
ability threshold. 


Ofqual provided the percentages of students at or above three relevant grades (grades 4, 5 and 7) 
taken from the 2017 GCSE population. These are shown in Table 5.1. These percentages were 
mapped to three ability threshold scores in the NRT in 2017. 


Table 5.1 Maths 2017 NRT Baseline Thresholds 


Threshold Percentage of students above 
threshold from 2017 GCSE 

Grade 7 and above 19.9 

Grade 5 and above 49.7 

Grade 4 and above 70.7 


In 2019, the NRT data for the years 2017 to 2019 were analysed together using Item Response 
Theory (IRT) modelling techniques. By analysing all the data concurrently, ability distributions 
could be produced for the 2017, 2018 and 2019 samples on the same scale. The percentages of 
students at each of the three GCSE grade boundaries, fixed on the 2017 distribution, could then be 
mapped onto the 2018 and 2019 distributions to produce estimates of the percentage of students 
at the same level of ability in those years. For example, the percentage of pupils at the ‘Grade 4 
and above’ threshold in the 2017 GCSE population was 70.7 per cent. This was mapped onto the 
2017 distribution to read off an ability value equivalent to that grade boundary. The same ability 
value on the 2018 and 2019 distributions can then be found, and the percentage of students at this 
threshold or above in those years established. In this way, we are able to estimate the percentage 
of students at the same level of ability as represented in the 2017 GCSE population, for each year 
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of the NRT going forward. The precision of these estimates is dependent on both the sample 
achieved and the reliability of the tests as measures. 


Table 5.2 presents the percentages of students achieving above the specified grade boundaries for 
the years 2017 to 2019. Confidence intervals for percentages are provided in brackets alongside 
the estimates. This is important as it shows that although there have been increases in 
performance, these are often within the confidence intervals. The statistical interpretation of the 
differences is discussed below. 


Table 5.2 Estimated percentages at grade boundaries in mathematics 


Threshold Grade 4 and above Grade 5 and above Grade 7 and above 
2017 70.7 (69.3-72.1) 49.7 (48.0-51.3) 19.9 (18.6-21.2) 
2018 73.2 (71.7-74.7) 52.4 (50.7-54.0) 21.5 (20.2-22.8) 
2019 73.1 (71.8-74.4) 51.9 (50.3-53.5) 22.7 (21.4-24.0) 


Note that, because of the way in which they have been computed, the 2017 NRT percentages 
match closely with the GCSE percentages. The confidence intervals for them reflect the fact that 
the NRT 2017 outcomes carry the statistical error inherent in a sample survey, as per the 
subsequent years. 


Since the 2017 and 2018 percentages have been re-estimated following the concurrent calibration 
with the 2019 data, these figures differ slightly from those reported in previous years. Some degree 
of variation is expected given the addition of more data, and the differences seen are well within 
the confidence intervals for the estimates. 


Table 5.3 shows the half widths of the confidence intervals. The table illustrates that the precision 
for 2019 is slightly improved relative to 2017 and 2018. 


Table 5.3. Mathematics NRT half width of confidence intervals each year 


Threshold Grade 4 and above Grade 5 and above Grade 7 and above 
2017 1.4 1.6 1.3 
2018 1.5 1.7 1.3 
2019 1.3 1.6 1.3 
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Figure 5.1 presents 95% confidence intervals around the percentages achieving at least the 
specified grade boundary in 2019, as compared to 2018 and the 2017 population baseline 
percentages. The 2017 population percentages are represented as dotted lines and the trend lines 
across years as solid lines. This format has been used to encourage the reader to compare the 
2019 and 2018 point estimate confidence bands with the 2017 baseline population percentages, 
bearing in mind the confidence intervals. 


Figure 5.1 Chart showing performance each year as compared to the 2017 
population percentiles (with trend lines) 


2017 2018 2019 
year 


—*- grade 4 and above -*- grade 5 and above -*-grade 7 and above 


A key question arising for the NRT results in a given year is to determine if there is a statistically 
significant difference from the results in the previous year. For the NRT, several comparisons are 
being made and this gives rise to a danger that changes that arise by chance may seem real. 
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Hence the criteria for significance which have been used are adjusted for multiple comparisons. 
(For more information see Appendix A.) 


Using this method for multiple comparisons, the NRT mathematics data shows that the percentage 
of students at grade 7 and above in 2019 is significantly greater than in 2017 at the 5% level of 
significance, but not at the 1% level of significance. The differences for grade 4 and above and 
grade 5 and above from 2017 to 2019 were not significantly different. Nor were there any 
significant differences from 2017 to 2018 or from 2018 to 2019 for any of the three grades.‘ 


4 The results of a given year’s NRT can be compared with the NRT study of 2017 (both are sample surveys, 
and the statistical error is therefore reflected in confidence intervals for 2017) or with the GCSE percentages 
of 2017, regarded as external constants. The 2018 Results Digest reported comparisons with the GCSE 
2017 population percentages. However, in order to make ongoing comparisons from year to year it was 
decided for 2019 onwards that comparing the outcomes of all NRT studies (including the 2017 NRT study, 
rather than 2017 GCSE percentages) would be more informative, and these differences are the focus of the 
2019 report. This explains why the conclusions regarding the changes between 2017 and 2018 differ in this 
report compared with the 2018 Results Digest. 
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6 Appendix A: A brief summary of the NRT 


English 


The English test takes one hour to administer and follows the curriculum for the reformed GCSE in 
English language. In each of the eight English test booklets, there are two components; the first is 
a reading test and the second a writing test. Each component carries 25 marks and students are 
advised to spend broadly equal time on each component. 


The reading test is based on an extract from a longer prose text, or two shorter extracts from 
different texts. Students are asked five, six or seven questions that refer to the extracts. Some 
questions of one to four marks require short responses or require the student to select a response 
from options provided. In each booklet, the reading test also includes a 6-mark question and a 10- 
mark question where longer, more in-depth responses need to be given. These focus on analysis 
and evaluation of particular aspects of the text or a comparison between texts. 


The writing test is a single, 25-mark task. This is an extended piece of writing, responding to a 
stimulus. For example, students may be asked to describe, narrate, give and respond to 
information, argue, explain or instruct. 


Maths 


For maths, a separate sample of students is also given one hour to complete the test. The test 
includes questions on number, algebra, geometry and measures, ratio and proportion, and 
statistics and probability — the same curriculum as the reformed GCSE. Each of the eight test 
booklets has 13 or 14 questions with a total of 50 marks and each student takes just one of the test 
booklets. 


Analysis 


The analysis process followed a sequence of steps. Initially, the tests were analysed using 
Classical Test Theory to establish that they had performed well, with appropriate difficulty and 
good levels of reliability. The subsequent analyses used Item Response Theory techniques to link 
all the tests together from 2017 to 2019 and estimate the ability of all the students on a common 
scale for each subject for each year, independent of the test or items they had taken. These ability 
estimates were then used for calculating the ability level at the percentiles associated with the 
GCSE grade boundaries in 2017 and mapping these onto the 2018 and 2019 distributions to 
generate percentile estimates for those years. 


Multiple Comparisons 


The statistical significance of the difference between two percentages estimated in two years, say 
2018 and 2019, may be approached with a two-sample t-statistic. Because of the huge number of 
degrees of freedom, the value can be compared with the standard normal distribution rather than 
the t-distribution. For a comparison of two percentages, say the percentage of students at grade 4 
or higher between two years, the critical value at a confidence level of 0.05 (5%) would usually be 
1.96. However, since there are three grade thresholds across multiple years, there are a number of 
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comparisons which could be made. As the number of simultaneous comparisons grows, the 
probability that some of them are significant by chance rapidly increases. To guarantee that the 
chosen level of significance is guaranteed overall, we have implemented an adjustment for multiple 
comparisons. 
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